Name Phylogeny: A Generative Model of String Variation
نویسندگان
چکیده
Many linguistic and textual processes involve transduction of strings. We show how to learn a stochastic transducer from an unorganized collection of strings (rather than string pairs). The role of the transducer is to organize the collection. Our generative model explains similarities among the strings by supposing that some strings in the collection were not generated ab initio, but were instead derived by transduction from other, “similar” strings in the collection. Our variational EM learning algorithm alternately reestimates this phylogeny and the transducer parameters. The final learned transducer can quickly link any test name into the final phylogeny, thereby locating variants of the test name. We find that our method can effectively find name variants in a corpus of web strings used to refer to persons inWikipedia, improving over standard untrained distances such as Jaro-Winkler and Levenshtein distance.
منابع مشابه
A Numerical Simulation Study on Wellbore Temperature Field of Water Injection in Highly Deviated Wells
According to the temperature distribution of water injection well-bore in highly deviated wells under different conditions and unstable temperature field heat conduction principles, a true three-dimensional model was established to analyze the law of variation on temperature of highly deviated wells during the water injection process, and to analyze the factors that influence the water injectio...
متن کاملA Generative Entity-Mention Model for Linking Entities with Knowledge Base
Linking entities with knowledge base (entity linking) is a key issue in bridging the textual data with the structural knowledge base. Due to the name variation problem and the name ambiguity problem, the entity linking decisions are critically depending on the heterogenous knowledge of entities. In this paper, we propose a generative probabilistic model, called entitymention model, which can le...
متن کاملHaplotype Block Partitioning and tagSNP Selection under the Perfect Phylogeny Model
Single Nucleotide Polymorphisms (SNPs) are the most usual form of polymorphism in human genome.Analyses of genetic variations have revealed that individual genomes share common SNP-haplotypes. Theparticular pattern of these common variations forms a block-like structure on human genome. In this work,we develop a new method based on the Perfect Phylogeny Model to identify haplo...
متن کاملA Mechanical Model and its Experimental Verification for a Water Injection String in a Highly Deviated Well
Water injection strings in highly deviated wells are subjected to complex forces on the string bore. In this work, a mechanical model is developed for these forces and for those on downhole tools. On the basis of this model, and taking account of the characteristics of the string in different working conditions, a temperature field model and a pressure field model are introduced, and a statical...
متن کاملProbability in phonological generalizations: modeling French optional final consonants
The starting point for this paper is the simple and unoriginal observation that many phonological generalizations are variable. By variable, I mean that their application is not predictable from phonological properties of the string, but rather depends probabilistically on other factors. In speech perception, phonological generalizations are particularly variable because of dialectal and inter-...
متن کامل